home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
PC World 2000 February
/
PCWorld_2000-02_cd.bin
/
Software
/
Servis
/
FFE
/
TEXT.SWG
/
0003_RTF Rich Text Format.pas
< prev
next >
Wrap
Pascal/Delphi Source File
|
1997-05-11
|
14KB
|
642 lines
Specification for RTF
---------------------
RTF text is a form of encoding of various text
formatting properties, document structures,
and document properties, using the printable
ASCII character set. Special characters can be also
thus encoded, although RTF does not prevent the utilization
of character codes outside the ASCII printable set.
The main encoding mechanism of "control words" provides a name
space that may be later used to expand the realm of RTF with
macros, programming, etc.
1. BASIC INGREDIENTS
Control words are of the form:
\lettersequence <delimiter> where <delimiter>. is:
. a space: the space is part of the control word.
. a digit or - means that a parameter follows. The following
digit
sequence is then delimited by a space or any other
non-letter-or-digit as for control words.
. any other non-letter-or digit: terminates the control word,
but is not
a part of the control word.
By "letter:, here we mean just the upper and lower case ASCII
letters.
Control symbols consist of a \ character followed by a single
non-letter. They require no further delimiting.
Notes: control symbols are compact, but there are not too many
of them. The number of possible control words are not limited.
The parameter is partially incorporated in control symbols, so
that
a program that does not understand a control symbol can recognize
and ignore the corresponding parameter as well.
In addition to control words and control symbols, there are also
the braces:
{ group start, and
} group end. The text grouping will be used for formatting
and to delineate document structure - such as the footnotes,
headers,
title, and so on. The control words, control symbols, and braces
constitute control information. All other characters in RTF text
constitute "plain text".
Since the characters \, {, and } have specific uses in RTF,
the control symbols \\,\{, and \} are provided to express
the corresponding plain characters.
2. WHAT RTF TEXT MEANS (SEMANTICS)
The reader of a RTF stream will be concerned with:
Separating control information from plain text.
Acting on control information. This is designed to be
a relatively simple process, as described below.
Some control information just contributes special
characters to the plain text stream. Other information
serves to change the "program state" which includes
properties of the document as a whole and also a stack
of "group states" that apply to parts.
Note that the group state is saved by the { brace and is
restored by the } brace. The current group state specifies:
1. the "destination" or part of the document that the
plain text is building up.
2. the character formatting properties - such as bold or
italic.
3. the paragraph formatting properties - such as justified.
4. the section formatting properties - such as number of columns.
Collecting and properly disposing of the remaining "plain text"
as directed by the current group state.
In practice the RTF reader will proceed as follows:
0. read next char
1. if ={
stack current state. current state does not change.
continue.
2. if =}
unstack current state from stack. this will change the
state in general.
3. if =\
collect control word/control symbol and parameter, if any.
look up word/symbol in symbol table (a constant table)
and act according to the description there. The different
actions are listed below. Parameter is left available for use by
the action.
Leave read pointer before or after the delimiter, as appropriate.
After the action, continue.
4. otherwise, write "plain text" character to current destination
using current formatting properties.
Given a symbol table entry, the possible actions are as follows:
A. Change destination:
change destination to the destination described in the entry.
Most destination changes are legal only immediately after a {.
Other restrictions may also apply (for example, footnotes
may not be nested.)
B. Change formatting property:
The symbol table entry will describe the property and
whether the parameter is required.
C. Special character:
The symbol table entry will describe the character code..
goto 4.
D. End of paragraph
This could be viewed as just a special character.
E. End of section
This could be viewed as just a special character.
F. Ignore
3. SPECIAL CHARACTERS
The special characters are explained as they exist in Mac Word.
Clearly, other characters may be added for interchange with other
programs. If a character name is not recognized by a reader,
according
to the rules described above, it will be simply ignored.
\chpgn current page number (as in headers)
\chftn auto numbered footnote reference
(footnote to follow in a group)
\chpict placeholder character for picture
(picture to follow in a group)
\chdate current date (as in headers)
\chtime current time (as in headers)
\| formula character
\~ non-breaking space
\- non-required hyphen
\_ non-breaking hyphen
\page required page break
\line required line break (no paragraph break)
\par end of paragraph.
\sect end of section and end of paragraph.
\tab same as ASCII 9
For simplicity of operation, the ASCII codes 9 and 10 will be
accepted
as \tab and \par respectively. ASCII 13 will be ignored. The
control
code \<10> will be ignored. It may be used to include "soft"
carriage
returns for easier readability but which will have no effect
on the interpretation.
4. DESTINATIONS
The change of destination will reset all properties to default.
Changes
are legal only at the beginning of a group (by group here we
mean the
text and controls enclosed in braces.)
\rtf<param>
The destination is the document. The parameter is the
version number of the writer. This destination preceded
by { the beginnings of RTF documents and the corresponding }
marks the end.
Legal only once after the initial {. Small scale interchange of
RTF where
other methods for marking the end of string are available, as in
a
string constant, need not include this identification but
will start with this destination as the default.
\pict
The destination is a picture. The group must immediately
follow a \chpict character. The plain text describes
the picture as a hex dump (string of characters 0,1,...
9, a, ..., e, f.)
(Formatting properties to determine data interpretation, size)
\footnote
The destination is a footnote text. The group must
immediately follow the footnote reference character(s).
\header
The destination is the header text for the current section.
The group must precede the first plain text character
in the section.
\headerl
Same as above, but header for left-hand pages.
\headerr
Same as above, but header for right-hand pages.
\headerf
Same as above, but header for first page.
\footer
Same as above, but footer.
\footerl
Same as above, but footer for left-hand pages.
\footerr
Same as above, but footer for right-hand pages.
\footerf
Same as above, but header for first page.
\ftnsep
Same as above, but text is footnote separator
\ftnsepc
Same as above, but text is separator for continued footnotes.
\ftncn
Same as above, but text is continued footnote notice.
\info
text is information block for the document. Parts of the
text is further classified by "properties" of the text
that are listed below - such as "title". These are not
formatting properties, but a device to delimit and identify
parts of the info from the text in the group.
\stylesheet
text is the style sheet for the document.
More precisely, text between semicolons are taken to be
style names which will be defined to stand for the
formatting properties which are in effect.
\fonttbl
font table. See below.
\colortbl
color table. See below.
\comment
text will be ignored.
5. DOCUMENT FORMATTING PROPERTIES
(000 stands for a number which may be signed)
\paperw000 paper width in twips 12240
\paperh000 paper height 15840
\margl000 left margin 1800
\margr000 right margin 1800
\margt000 top margin 1440
\margb000 bottom margin 1440
\facingp facing pages
\gutter000 gutter width
\deftab000 default tab width 720
\widowctrl enable widow control
\endnotes footnotes at end of section
\ftnbj footnotes at bottom of page default
\ftntj footnotes beneath text (top just)
\ftnstart000 starting footnote number 1
\ftnrestart restart footnote numbers each page
\pgnstart000 starting page number 1
\linestart000 starting line number 1
\landscape printed in landscape format
(the "next file" property will be encoded in the info text )
6. SECTION FORMATTING PROPERTIES
\sectd reset to default section properties
\nobreak break code
\colbreak break code default
\pagebreak break code
\evenbreak break code
\oddbreak break code
\pgnrestart restart page numbers at 1
\pgndec page number format decimal default
\pgnucrm page number format uc roman
\pgnlcrm page number format lc roman
\pgnucltr page number format uc letter
\pgnlcltr page number format lc letter
\pgnx000 auto page number x pos 720
\pgny000 auto page number y pos 720
\linemod000 line number modulus
\linex000 line number - text distance 360
\linerestart line number restart at 1 default
\lineppage line number restart on each page
\linecont line number continued from prev section
\headery000 header y position from top of page 720
\footery000 footer y position from bottom of page 720
\cols000 number of columns 1
\colsx000 space between columns 720
\endnhere include endnotes in this section
\titlepg title page is special
7. PARAGRAPH FORMATTING PROPERTIES
\pard dreset to default para properties.
\s000 style
\ql quad left default
\ql right
\qj justified
\qc centered
\fi000 first line indent
\li000 left indent
\ri000 right indent
\sb000 space before
\sa000 space after
\sl000 space between lines
\keep keep
\keepn keep with next para
\sbys side by side
\pagebb page break before
\noline no line numbering
\brdrt border top
\brdrb border bottom
\brdrl border left
\brdrr border right
\box border all around
\brdrs single thickness
\brdrth thick
\brdrsh shadow
\brdrdb double
\tx000 tab position
\tqr right flush tab (these apply to last specified
pos)
\tqc centered tab
\tqdec decimal aligned tab
\tldot leader dots
\tlhyph leader hyphens
\tlul leader underscore
\tlth leader thick line
8. CHARACTER FORMATTING PROPERTIES
\plain reset to default text properties.
\b bold
\i italic
\strike strikethrough
\outl outline
\shad shadow
\scaps small caps
\caps all caps
\v invisible text
\f000 font number n
\fs000 font size in half points 24
\ul underline
\ulw word underline
\uld dotted underline
\uldb double underline
\up000 superscript in half points
\dn000 subscript in half points
9. INFO GROUP
The plain text in the group is used to specify the various
fields of the information block. The current field may be
thought of as a particular setting of the "sub-destination"
property of the text..
\title following plain text is the title
\subject following text is the subject
\operator
\author
\keywords
\doccomm comments (not to be confused with \comment )
\version
\nextfile following text is name of "next" file
The other properties assign their parameters directly to the
info block.
\verno000 internal version number
\creatim creation time follows
\yr000 year to be assigned to previously specified time
field
\mo000
\dy000
\hr000
\min000
\sec000
\revtim revision time follows
\printtim print time follows
\buptim backup time follows
\edmins00 editing minutes
\nofpages000
\nofwords000
\noofchars000
\id000 internal ID number